Data Compression in Database Query Processing
نویسنده
چکیده
Row-oriented databases (or “row-store”) employ data compression methods (like dictionary encoding) to reduce the I/O cost by decreasing the data sizes. However, there are two limitations on row-stores when applying data compression schemes: (1) row-stores only allow encoding one single value at a time, and (2) they have to pay the decompression cost in query processing. The above shortcomings limit the wide usage of data compression in row-oriented databases. On the contrary, column-oriented databases (or “columnstore”) provide more opportunities for data compression as the values of the same attribute are stored consecutively. In a column-oriented database, compression schemes that encode multiple values crossing multiple rows at once are allowed, but such schemes do not work in row-stores. In addition, column-stores can sometimes perform queries directly on compressed data without decompression, which yields the ultimate performance boost, since the I/O cost is saved by accessing less data and the decompression cost is also avoided. However, column-stores do not consider heavyweighted compressions since they can not support random accesses, though they have high compression qualities. This research exam report surveys different compression techniques on both row-stores and column-stores. For each compression scheme, we first review the existing work, then present our ideas. In addition, we introduce a class of queries that both row-stores and column-stores miss key opportunities to answer them efficiently. Then we present a new data structure, which allows both light-weighted and heavyweighted data compressions, to answer the queries efficiently.
منابع مشابه
انتخاب مناسبترین زبان پرسوجو برای استفاده از فراپیوندها جهت استخراج دادهها در حالت دیتالوگ در سامانه پایگاه داده استنتاجی DES
Deductive Database systems are designed based on a logical data model. Data (as opposed to Relational Databases Management System (RDBMS) in which data stored in tables) are saved as facts in a Deductive Database system. Datalog Educational System (DES) is a Deductive Database system that Datalog mode is the default mode in this system. It can extract data to use outer joins with three query la...
متن کاملRelational Databases Query Optimization using Hybrid Evolutionary Algorithm
Optimizing the database queries is one of hard research problems. Exhaustive search techniques like dynamic programming is suitable for queries with a few relations, but by increasing the number of relations in query, much use of memory and processing is needed, and the use of these methods is not suitable, so we have to use random and evolutionary methods. The use of evolutionary methods, beca...
متن کاملData Compression and Database Performance
Data compression is widely used in data management to save storage space and network bandwidth. In this report, we outline the performance improvements that can be achieved by exploiting data compression in query processing. The novel idea is to leave data in compressed state as long as possible, and to only uncompress data when absolutely necessary. We will show that many query processing algo...
متن کاملAn Effective Path-aware Approach for Keyword Search over Data Graphs
Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...
متن کاملCompression-Aware In-Memory Query Processing: Vision, System Design and Beyond
In-memory database systems have to keep base data as well as intermediate results generated during query processing in main memory. In addition, the e↵ort to access intermediate results is equivalent to the e↵ort to access the base data. Therefore, the optimization of intermediate results is interesting and has a high impact on the performance of the query execution. For this domain, we propose...
متن کاملImproving Compression Efficiency of Data Warehouse
Data compression has a paramount effect on Data warehouse for reducing data size and improving query processing. Distinct compression techniques are feasible at different levels, each of types either give good compression ratio or suitable for query processing. This paper focuses on applying lossless and lossy compression techniques on relational databases. The proposed technique is used at att...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016